Profilers
Tracy
-
Tracy .
-
Releases .
-
OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL, CUDA.
-
Direct support for C, C++, Lua, Python and Fortran. Bindings for Rust, Zig, C, OCaml, Odin, etc.
-
Windows, Linux, FreeBSD, Android, WSL, OSX, iOS, QNX.
Advantages of Tracy
-
You can profile CPU, GPU, locks, memory allocations, context switches, and more.
-
Statistical information about zones, trace comparisons, or inclusion of inline function frames in call stacks (even in statistics of sampled stacks) are features unique to Tracy.
-
Tracy uses low-level kernel APIs, or even raw assembly, where other profilers rely on layers of abstraction.
-
Tracy is multi-platform right from the very beginning. Both on the client and server-side. Other profilers tend to have Windows-specific graphical interfaces.
-
Tracy can handle millions of frames, zones, memory events, and so on, while other profilers tend to target very short captures.
-
Tracy provides a mapping of source code to the assembly, with detailed information about the cost of executing each instruction on the CPU.
-
.
Server and Client
-
In Tracy terminology, the profiled application is a client, and the profiler itself is a server. It was named this way because the client is a thin layer that just collects events and sends them for processing and long-term storage on the server. The fact that the server needs to connect to the client to begin the profiling session may be a bit confusing at first.
-
You may profile a game on a mobile phone over the wireless connection, with the profiler running on a desktop computer. Or you can run the client and server on the same machine, using a localhost connection. It is also possible to embed the visualization front-end in the profiled application, making the profiling self-contained.
Notes
-
I had problems with Tracy with ASan, leading to game crashes sometimes. Keep that in mind.
Performance
Zones gaps
-
I was getting 200-470ns in zone gaps.
-
Check https://github.com/wolfpld/tracy/issues/1212
-
slomp (Tracy github contributor):
"As for the "Tracy achieves such small overhead (only 2.25 ns)", well, there's a lot of nuance here (with rdtsc and such), but put simply, on a modern x64 machine, an "empty" zone should be between 10ns to 50ns." -
Maybe this is related to the allocation made from the
___tracy_alloc_srcloc-
Makes sense. Source location information in most Tracy Zone annotations is handled at compile-time and are stored in static, read-only data blocks in the program binary, so no dynamic allocation needs to happen.
-
-
-
The first zone takes longer than others.
-
This happened for me and 'slomp'.
-
wolfpld (Tracy author):
-
Queue block allocation cost is amortized.
-
-
-
While disabling the call stack, I got around 80-200ns between zones, sometimes down to 30ns. This concludes that the problem was indeed the call stack (I was using 2 before), and this could probably be further improved if not using the C API and avoiding the extra allocations. I check if it's possible to do something similar to the macro used, but in Odin, as the language doesn't offer much support for meta language :/
Start profiling
-
Profiling debugging builds makes little sense, as the unoptimized code and additional checks (asserts, etc.) completely change how the program behaves.
-
In the default configuration, Tracy is disabled. This way, you don’t have to worry that the production builds will collect profiling data. To enable profiling, you will probably want to create a separate build configuration, with the
TRACY_ENABLEdefine.-
Make sure that this macro is defined for all files across your project (e.g. it should be specified in the
CFLAGSvariable, which is always passed to the compiler, or in an equivalent way), and not as a#definein just some of the source files. -
Tracy does not consider the value of the definition, only the fact if the macro is defined or not.
-
Be careful not to make the mistake of assigning numeric values to Tracy defines, which could lead you to be puzzled why constructs such as
TRACY_ENABLE=0don’t work as you expect them to do.
-
-
In addition, you should enable usage of the native architecture of your CPU (e.g. -march=native) to leverage the expanded instruction sets, which may not be available in the default baseline target configuration.
-
On Unix, make sure that the application is linked with libraries
libpthreadandlibdl. BSD systems will also need to be linked withlibexecinfo.
When the profiling starts
-
By default, Tracy will begin profiling even before the program enters the main function.
-
However, suppose you don’t want to perform a full capture of the application lifetime. In that case, you may define the
TRACY_ON_DEMANDmacro, which will enable profiling only when there’s an established connection with the server.
Short-lived Apps
-
In case you want to profile a short-lived program (for example, a compression utility that finishes its work in one second), set the
TRACY_NO_EXITenvironment variable to 1. With this option enabled, Tracy will not exit until an incoming connection is made, even if the application has already finished executing. If your platform doesn’t support an easy setup of environment variables, you may also add theTRACY_NO_EXITdefine to your build configuration, which has the same effect. -
You should note that if on-demand profiling is disabled (which is the default), then the recorded events will be stored in the system memory until a server connection is made and the data can be uploaded.
Client connection
-
By default, the Tracy client will announce its presence to the local network12. If you want to disable this feature, define the
TRACY_NO_BROADCASTmacro. The program name that is sent out in the broadcast messages can be customized by using theTracySetProgramName(name)macro. -
By default, the Tracy client will listen on all network interfaces. If you want to restrict it to only listening on the localhost interface, define the
TRACY_ONLY_LOCALHOSTmacro at compile-time, or set theTRACY_ONLY_LOCALHOSTenvironment variable to 1 at runtime. -
If you need to use a specific Tracy client address, such as QNX requires, define the
TRACY_CLIENT_ADDRESSmacro at compile-time as the desired string address. -
By default, the Tracy client will listen on IPv6 interfaces, falling back to IPv4 only if IPv6 is unavailable. If you want to restrict it to only listening on IPv4 interfaces, define the
TRACY_ONLY_IPV4macro at compile-time, or set theTRACY_ONLY_IPV4environment variable to 1 at runtime. -
By default, the client and server communicate on the network using port 8086. The profiling session utilizes the TCP protocol, and the client sends presence announcement broadcasts over UDP.
-
Suppose for some reason you want to use another port16. In that case, you can change it using the
TRACY_DATA_PORTmacro for the data connection andTRACY_BROADCAST_PORTmacro for client broadcasts. Alternatively, you may change both ports at the same time by declaring theTRACY_PORTmacro (specific macros listed before have higher priority). You may also change the data connection port without recompiling the client application by setting theTRACY_PORTenvironment variable. If a custom port is not specified and the default listening port is already occupied, the profiler will automatically try to listen on a number of other ports.-
To enable network communication, Tracy needs to open a listening port. Make sure it is not blocked by an overzealous firewall or anti-virus program.
-
-
"Run the profiled application (e.g.
demo) in privileged mode (sudo/administrator) to enable even more features in Tracy."-
The author of odin-tracy said that.
-
Instrumenting the App
-
All the user-facing interface is contained in the
public/tracy/Tracy.hppheader file.
Naming Threads
-
tracy::SetThreadName(name)-
Set thread names for proper identification of threads.
-
Tracy will try to capture thread names through operating system data if context switch capture is active. However, this is only a fallback mechanism, and it shouldn’t be relied upon.
-
public/common/TracySystem.hpp.
-
-
tracy::SetThreadNameWithHint(name, int32_t groupHint)-
This hint is an arbitrary number that is used to group threads together in the profiler UI.
-
The default value and the value for the main thread is zero.
-
Zones
-
tracy.ZoneNC("worker doing stuff", 0xff0000);
Impressions
-
(2025-11-04)
-
Compilation:
-
If following the compilation steps is ok.
-
The total thing weighs 1.33GB, after compiling everything. Uses Visual Studio, C++, MSVC. So yeah, it's unpleasant.
-
-
Setup
-
Tracy Profiler supports MSVC, GCC, and clang.
-
You will need to use a reasonably recent version of the compiler due to the C++11 requirement.
-
All the files required to integrate your application with Tracy are contained in the public directory.
-
With the source code included in your project, add the
public/TracyClient.cppsource file to the IDE project or makefile. You’re done. -
Tracy is now integrated into the application.
CMake
-
Tracy uses the CMake build system. Unlike in most other programs, the root-level CMakeLists.txt file is only used to provide client integration. The build definition files used to create profiler executables are stored in directories specific to each utility.
-
The CMakeLists.txt file only contains the general definition of how the program should be built. To be able to actually compile the program, you must first create a build directory that takes into account the specific compiler you have on your system, the set of available libraries, the build options you specify, and so on.
-
You can do this by issuing the following command, in this case for the profiler utility:
cmake -B profiler / build -S profiler - DCMAKE_BUILD_TYPE = Release
-
Now that you have a build directory, you can actually compile the program. For example, you could run the following command:
cmake -- build profiler / build -- config Release -- parallel
-
The build directory can be reused if you want to compile the program in the future, for example if there have been some updates to the source code, and usually does not need to be regenerated. Note that all build artifacts are contained in the build directory.
-
You can integrate Tracy with CMake by adding the git submodule folder as a subdirectory.
# set options before add_subdirectory
# available options : TRACY_ENABLE , TRACY_LTO , TRACY_ON_DEMAND , TRACY_NO_BROADCAST , TRACY_NO_CODE_TRANSFER , ...
option(TRACY_ENABLE " " ON)
option(TRACY_ON_DEMAND " " ON)
add_subdirectory(3 rdparty / tracy) # target : TracyClient or alias Tracy :: TracyClient
-
Link
Tracy::TracyClientto any target where you use Tracy for profiling:
target_link_libraries ( <TARGET> PUBLIC Tracy :: TracyClient )
-
With CMake 3.11+, you can use Tracy via CMake FetchContent. In this case, you do not need to add a git submodule for Tracy manually. Add this to your
CMakeLists.txt:
FetchContent_Declare (
tracy
GIT_REPOSITORY https://github.com/wolfpld/tracy.git
GIT_TAG master
GIT_SHALLOW TRUE
GIT_PROGRESS TRUE
)
FetchContent_MakeAvailable(tracy)
-
Then add this to any target where you use tracy for profiling:
target_link_libraries(<TARGET> PUBLIC TracyClient)
Static Library
-
If you are compiling Tracy as a static library to link with your application, you may encounter some unexpected problems if not using any symbols by the library. To avoid this, you can simply add the
TracyNoopmacro somewhere in your code, for example in the main function. The macro doesn’t do anything useful, but it inserts a reference that is satisfied by the static library, which results in the Tracy code being linked in and the profiler being able to work as intended.
Server steps
-
(2025-11-04) I did it this way.
git clone --recurse-submodules https://github.com/oskarnp/odin-tracy-
While in the
odin-tracydir, with thex64 Native Tools Command Prompt for VS 20XX.
cd tracy\vcpkg .\install_vcpkg_dependencies.bat-
Add
#include <chrono>totracy\server\TracyView.hpp
cd tracy\profiler\build\win32 msbuild Tracy.sln -t:Build -p:Configuration=Release-
The server executable is in
odin-tracy\tracy\profiler\build\win32\x64\Release.
-
Client steps
-
(2025-11-04) I did it this way.
-
While in
odin-tracydir: -
cl -MT -O2 -DTRACY_ENABLE -c tracy\public\TracyClient.cpp -Fotracy-
clinvokes MSVC. -
-MTlinks against the static multithreaded CRT. -
-O2enables full optimization. -
-DTRACY_ENABLEdefines the preprocessor symbol so Tracy’s instrumentation is active. -
-ccompiles only and produces an object file. -
tracy\public\TracyClient.cppis the source. -
-Fotracysets the output object file name totracy.obj.
-
-
lib tracy.obj-
libcreates a static library. -
tracy.objis the input. -
The default output is
tracy.libunless another name is passed.
-
-
Effect: compile the Tracy client into an object using static CRT and optimizations, then pack it into a static library.
-
This will create the files:
-
odin-tracy/tracy.obj-
Only used to generate the
tracy.lib
-
-
odin-tracy/tracy.lib-
Used by the odin-tracy binding.
-
-
-
REMEMBER TO :
-
Set
-define:TRACY_ENABLE=true
-
Pre-built binaries
-
The version releases of the profiler are provided as precompiled Windows binaries for download at https://github.com/wolfpld/tracy/releases, along with the user manual.
-
You will need to install the latest Visual C++ redistributable package to use them.
-
Note that these binary releases require AVX2 instruction set support on the processor. If you have an older CPU, you will need to set a proper instruction set architecture in the project properties and build the executables yourself.
Spall
-
About .
-
The Jobs repo uses Spall in the examples:
-
Boids.
-
I prefer this one, because it's visual with RayLib.
-
-
Background.
-
Simple.
-
-
A
.spallfile is generated and used on the site. -
For a non-web version, you have to buy Spall for $100 dollars.
-
Impressions :
-
I don't like it being web-based.
-
It's solid, cool, made in Odin, nice.
-
Generates local files, very simple to understand.
-
Officially supported by Odin.
-
Nvidia Nsight Graphics - GPU Trace
-
Supports a limited number of frames, e.g. up to ~1–15 frames depending on options). Useful for small multi-frame captures where you already know the target time window.
-
(2025-10-04)
-
I had to run as an admin to use it.
-
.
-
AMD GPU Profiler (AMD RGP)
-
https://gpuopen.com/rgp/
-
RGP historically is single-frame focused, though driver/driver tools have timeline features for profiling
Intel GPA
-
.
Nsight Systems
-
https://developer.nvidia.com/nsight-systems
-
(2025-10-04)
-
Alt + Scroll:
-
Scrolls horizontally.
-
I didn't like that.
-
-
Ctrl + Scroll:
-
Zoom.
-
-
.
-
I thought the information was pretty "bad" and not specific to Vulkan.
-
I recorded 1m18s of gameplay, taking a few mins to generate the profile.
-